Artificial Neural Network Prediction of Mortality in Cancer Patients Presenting for Radiation Therapy at a Multisite Institution

Introduction: For many decades, the management of cancer has utilized radiation therapy, which continues to evolve with technology to improve patient outcomes. However, despite the standardization of treatment plans and the establishment of best clinical practices based on prospective, randomized trials and adherence to National Comprehensive Cancer Network (NCCN) guidelines, the outcomes from radiation therapy are highly variable and dependent on a number of factors, including patient demographics, tumor characteristics/histology, and treatment parameters. In this study, we attempt to use available patient data and treatment parameters at the time of radiation therapy to predict future outcomes using artificial intelligence (AI). Methods: Six thousand five hundred ninety-five cases of patients who completed radiation treatment were selected retrospectively and used to train artificial neural networks (ANNs) and baseline models (i.e., logistic regression, random forest, support vector machines [SVMs], gradient boosting [XGBoost]) for binary classification of mortality at multiple time points ranging from six months to five years post-treatment. A hyperparameter grid search was used to identify the optimal network architecture for each time point, using sensitivity as the primary outcome metric. Results: The median age was 75 years (range: 2-102 years). There were 63.8% females and 36.1% males. The results indicate that ANNs were able to successfully perform binary mortality prediction with an accuracy greater than random chance and greater sensitivity than baseline models used. The best-performing algorithm was the ANN, which achieved a sensitivity of 83.00% ± 4.89% for five-year mortality. Conclusion: The neural network was able to achieve higher sensitivity than Logistic Regression, SVM Random Forest, and XGBoost across all output target variables, demonstrating the utility of a neural network model for mortality prediction on the provided dataset.


Introduction
Accurate prediction of mortality in cancer patients is important in deciding the optimal management strategy and individualized treatment plan while taking into consideration quality of life.At present, most management decisions are based on patient age/functional status, tumor characteristics, and standardized treatment guidelines formulated by the National Comprehensive Cancer Network (NCCN) and ASTRO, which are based on scientific evidence such as phase I-III trials.Present prognostic systems use TNM staging, which incorporates a limited number of histopathologic variables (i.e., tumor size, degree of regional lymph node involvement, and presence of distant metastasis), to predict patient survival over a specified period of time.However, these systems stratify the population of cancer patients into extremely broad categories and thus cannot provide a more accurate projected outcome that incorporates all relevant prognostic data in a patient's record [1].In order to improve upon our ability to more accurately predict patient outcomes, additional patient variables must be taken into consideration.Consequently, there is potential for continued growth with the incorporation of artificial intelligence (AI) and machine learning (ML) for prognostication.AI/ML algorithms can explore large datasets with many inputs, identify complex patterns and relationships between multiple variables, and generate models to predict outcomes.ML has progressed significantly since Arthur Samuel first coined the term in the 1950s.ML is divided into supervised learning, unsupervised learning, and reinforcement learning [2].Supervised learning is a subcategory of ML and AI, defined by its use of labeled datasets with inputs and correct outputs to train algorithms to classify data or predict outcomes accurately.In comparison, unsupervised learning algorithms infer patterns from a dataset without reference to known or labeled outcomes.
Several commonly used supervised machine learning models will be described briefly.Logistic regression is a technique that uses a logistic model with multivariate input and outputs a binary value (i.e., 1 or 0).It performs computations comparable to those of a single-node neural network.Random forest is a machine learning technique that uses an ensemble of individual decision trees, with each "tree" outputting a binary output, with the most common output being the consensus decision.A support vector machine (SVM), also known as a support vector classifier (SVC), is a non-probabilistic binary linear model commonly used in classification problems that sort data into two groups by maximally separating the data points of different classes in distinct regions of space.It can handle linearly as well as non-linearly separable data.Xtreme gradient boost (XGBoost) is another method using classification trees and is popular due to its scalability and high accuracy [3][4][5].
Artificial neural networks (ANNs) are a class of statistical learning models that have demonstrated success in the domains of image and speech recognition and have recently become widespread in processing biomedical data [6].An ANN is composed of computational nodes ("neurons"), which each take a weighted summation over a series of inputs and pass that sum through a nonlinear transfer function (e.g., the sigmoid function) to produce an output.Multiple neurons are then assembled in layers, whose sequential outputs are passed to each other until a terminal output node or nodes are reached.The advantage of an ANN is its ability to produce a complex, nonlinear representation of a large number of input variables for use in classification or regression problems [7].
One study showed that ANNs incorporating the type of radiation treatment as input were able to predict mortality in patients who received radiotherapy for breast cancer [8].Radiation therapy (RT) is a mainstay of cancer treatment and is used for both palliative and curative measures.Radiation treatment decisions, including radiation dose, fractionation, and treatment intent, are based on established standards of care regimens, the tolerability of treatment, potential acute and chronic side effects, and the impact on quality of life.An interesting question is whether the fine-tuned parameters of radiation therapy delivery and patient demographics can be used to augment a predictive model of patient mortality.This study aims to evaluate the efficacy of ANNs in predicting six-month, one-year, three-year, and five-year mortality in a dataset of patients undergoing radiation therapy for various malignancy types.Throughout this study, we employ several machine learning techniques to create baseline models against which ANN performance is compared.

Materials And Methods
Institutional review board (IRB) approval was obtained to collect individual patient data from institutional electronic medical records.We collaborated with ONCORA, an integrated system that extracts data from electronic medical records across our institutional network.Patient cases were selected from the data source using the following criteria: primary cancers of the prostate, breast, gynecological, thoracic, gastrointestinal, spine, and brain.The dataset included both cases of palliative and curative intent (stages I-IV).Appendix A enumerates the data files collected.
Clinically, KPS (Karnofsky Performance Status) is a critical variable for assessing prognosis.During the initial dataset analysis, we discovered 2407 missing values for the KPS score.Therefore, we categorized patients into two separate datasets: the KPS dataset (patients with KPS) or the NKPS dataset (all patients with the KPS variable removed), which captures all cases of the original dataset.We organized the data in such a fashion because predictive models expect non-empty values for each variable.
The code written for this study was developed and tested on three separate computers: (i) an i7-3520M 2.90 GHz CPU and no dedicated graphics card (GPU); (ii) an i7-4790 3.60 GHz CPU and Nvidia GeForce GTX 745 GPU; and (iii) an i7-8750H CPU and Nvidia GeForce GTX 1070.The environments used to test the runnability of this study were Ubuntu 18.04 (64-bit), Windows 10  The following variables were excluded: case_id, attending_physician, date_first_fraction, cb_death, date_of_death, main_diagnosis_text, primary_diagnosis_text, secondary_diagnosis_text.Case ID would have no predictive value due to being an artifact of the electronic health record.We chose not to include the physician, even though this may have an impact on outcomes due to the large number of providers.Any variables that would proactively impart death information, including time of death and complications of treatment by death, were dropped due to the target variable of interest being mortality.Freetext variables were not included since they would not be able to be encoded to serve as useful model inputs.In addition, we dropped kps_score in the NKPS dataset only for the reasons described above.

Data preprocessing
The raw data underwent a cleaning and preprocessing step in Python.The possible target variables in the dataset were mortality within six months, one year, three years, and five years.In the second step, preprocessing of the data was performed.This step consisted of one-hot encoding, binarization of the target, and minimum-maximum scaling.We used a Pandas built-in function to perform one-hot encoding of all categorical variables.Using Scikit-Learn (a machine learning library for Python programming), a minimummaximum scaler was fitted to the continuous variables of the data and was used to scale all continuous variables to a range of 0 to 1.
In our neural networks, we did not manually adjust the parameters of the networks.Instead, almost all hyperparameters, as well as the number of neurons in the hidden layers, were grid-searched using Scikit-Learn to determine the optimal network design.We configured the network to use either one or two hidden layers with variable numbers of neurons in each, depending on the grid search parameters.The full training and evaluation processes described here were repeated for each of the four mortality variables for both datasets (KPS and NKPS), totaling eight self-contained experiments.Internal to the grid search algorithm, training vectors of cancer cases were passed to a Keras neural network.Keras used its TensorFlow engine backend to compute the output at the final neuron and used the Adam optimizer to update the weights of the network to minimize the loss function, binary cross-entropy.
GridSearchCV's algorithm created a neural network for each set of 54 parameters, performed training using Keras, and finally ran three-fold cross-validation across several metrics: binary accuracy, sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV).Each metric was outputted with an average and standard deviation for that network combination.In order to obtain a baseline for comparison, other classification algorithms were applied to the data: logistic regression, random forest, and SVM, as implemented in Python's Scikit-Learn library; and extreme gradient boosted trees, as implemented in Python's XGBoost module.
The models were constructed with default library parameters, except the random forest classifier, which was created with 1000 classification trees.The datasets used for this section were the NKPS and KPS datasets, as described previously.Training and evaluation were performed using an 80:20 split of training to testing data ratio.

Dataset characteristics
The initial dataset consisted of 6625 patients; however, 30 cases were excluded due to blank values in patient variables.The dataset meeting inclusion criteria totaled 6595 patients.Survival data were available for all patients.

Study Cancer site AI model Design Results
Park et al.
[  Predictive models can be created by applying advanced algorithms and analyzing large datasets to assist in personalized treatment planning and decision-making.With hyperparameter grid-searching, network configurations optimizing the algorithm for sensitivity were found for six-month, one-year, three-year, and five-year mortality.The ANN was able to achieve a higher sensitivity than other machine learning algorithms across all output target variables, demonstrating the utility of an ANN model for mortality prediction on the provided dataset.
The implemented ANNs were successfully able to perform binary prediction of mortality with an accuracy greater than random chance.The neural networks consistently demonstrated higher average sensitivities and lower standard deviations than all other models across both datasets and all mortality variables.This is an indication that the neural networks were able to extrapolate patterns and heuristics from the variables present in the clinical dataset.Interestingly, the neural networks trained on the NKPS dataset appeared to perform with more precision in terms of sensitivity and specificity compared to the results from the KPS dataset.We believe that this result may be attributable to the removal of 36.33% of the total cancer cases in the KPS dataset, leading to less reliable learning.
The neural network was able to achieve higher sensitivity for mortality prediction over a longer time frame compared to a shorter time frame.A plausible reason this occurred was the increased number of positive cases per mortality variable over a five-year time span.Aside from the imbalance of positive cases, the prediction of mortality in a shorter time frame is more difficult because there is a greater influence of random chance.Prediction over a longer time frame is a more tractable problem for the algorithm since the influence of random chance is reduced while the impact of the prognostic variables used as inputs takes precedence.In contrast to sensitivity, specificity was seen to decrease over time due to the neural network being more prone to making positive predictions with a lower detection threshold.As the "excitability" of the network increases, more type 1 errors occur, thus lowering the specificity.This is typical of the trade-off between sensitivity and specificity.
Binary accuracy was seen to decrease with each mortality time point, which can be explained in the context of data dynamics.As time increases, the mortality proportion of the dataset naturally increases, which leads to increased sensitivity and decreased specificity, as noted before.Additionally, there is increased uncertainty due to a greater number of variables or events that could affect mortality over time, such as comorbid conditions.Therefore, if the model does not perfectly capture the relevant temporal dynamics of the dataset, accuracy will decrease with time.We interpret binary accuracy as a less useful evaluator of neural network performance because of its dependence on target variable prevalence.For example, if there were zero cases of mortality at six months, the network could achieve a binary accuracy of 100% with 100% specificity by outputting only negative predictions with 0% sensitivity.Similar to binary accuracy, PPV and NPV are less useful evaluators of performance because of their similar dependence on target variable prevalence.Therefore, PPV and NPV, in addition to binary accuracy, are subject to the distribution of the data we are using to train the model.
The ratio of positive and negative samples differs greatly amongst the mortality variables at different followup dates, so it would be preferable to use a metric that can be consistently applied to all target variables.Sensitivity is intrinsic to the classifier system and cannot be affected by prevalence.However, it is significant to note that, in practical use, PPV is the most important metric in assessing the probability of mortality in a patient given a positive output from any model.
The clearest limitation of this study was the relatively small (N = 6595) dataset of cancer cases.Although this is considered a large medical cohort, especially when compared to other published studies using ML, it is small for machine learning.With more training samples, the networks could possibly develop a more nuanced understanding of the variables to achieve higher sensitivity and specificity.Another limitation specific to this dataset was the class imbalance inherent to mortality outcomes.There were significantly more survivors at six months compared to deceased individuals, which is not ideal for training a classification algorithm.A super-sampling technique such as the synthetic minority oversampling technique (SMOTE) may be employed in the future to achieve a balanced dataset to feed into the machine learning algorithm at training time and potentially improve performance [30].An internally obvious limitation was the lack of powerful hardware to run this study.Early attempts were made to run the grid search across hundreds of parameters (in excess of 700), but after several days of training, it became apparent that such program run-times were unacceptable.If greater computing power were available, a wider variety of network architectures could be explored; for instance, deeper networks with three or more layers.

Conclusions
The use of ANNs resulted in a successful mortality prediction model and was shown to outperform other machine learning algorithms with greater sensitivity and specificity.The unique features of our study, when compared to similar studies, include a relatively large and longitudinal sample size as we are a multisite institution composed of both academic and community medical centers, a broad set of pathological diagnoses, and high regional diversity.Moving forward, we believe there is room for improvement in our current model through the refinement of prognostic variables that were used as network inputs, such as the inclusion of radiomics (imaging data in the form of CT or MRI images).A convolutional neural network architecture that incorporates both prognostic and imaging data for each patient could theoretically increase performance by a vast margin.Future projects involving imaging data will require significantly more computing power than what was available in this study.

FIGURE 1 :
FIGURE 1: This figure compares the sensitivity of neural networks trained for each of the four mortality time points with the KPS (yellow) and NKPS (green) datasets.Error bars represent standard deviation.

FIGURE 2 :
FIGURE 2: This figure compares the specificity of neural networks trained for each of the four mortality time points with the KPS (yellow) and NKPS (green) datasets.Error bars represent standard deviation.

FIGURE 3 :
FIGURE 3: This chart compares the sensitivity of the neural network trained on the KPS dataset against other classification algorithms.Error bars represent standard deviation.

FIGURE 4 :
FIGURE 4: This chart compares the sensitivity of the neural network trained on the NKPS dataset against other classification algorithms.Error bars represent standard deviation.

Table 2
describes the patient characteristics, and Table3describes mortality rates in the dataset.

TABLE 12 : The binary accuracy, sensitivity, specificity, PPV, and NPV of each target variable in the NKPS dataset using Scikit-Learn's support vector classifier (SVC) with a linear kernel.
Data are presented as (mean ± standard deviation).

TABLE 15 : The binary accuracy, sensitivity, specificity, PPV, and NPV of each target variable in the KPS dataset using Scikit-Learn's Random Forest Classifier.
Data are presented as (mean ± standard deviation).

TABLE 17 : The binary accuracy, sensitivity, specificity, PPV, and NPV of each target variable in the KPS dataset using XGBoost. Data are presented as (mean ± standard deviation).
In the ensemble machine learning models, the best result was for the SVM, ANN, BN, and KNN classifiers, with an accuracy of 92.8% in testing and 90% in the validation set, respectively.